Abstract: In today’s digital environment, text databases are rapidly increases due to use of internet and communication mediums. Different text mining techniques are used for knowledge discovery and Information retrieval. Text data contains the side information along with the text data. Side information may be the metadata associated with text data like author, co-author or citation network, document provenance information, web links or other kind of data which provide more insights about the text documents. Such side information contains tremendous amount of information for the clustering purpose. Using such side information in the categorization process provides more refine clustered data. But sometimes side information may be noisy and results in wrong categorization which decreases the quality of clustering process. Therefore, a new approach for mining of text data using side information is suggested, which combines partitioning approach with probabilistic estimation model for the mining of text data along with the side information.
Keywords: Text data mining, categorization, side information, clustering.